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Abstract 



Controller design for systems typically faces a trade-off between robustness and performance, and the 
C) i reliability of linear controllers has caused many control practitioners to focus on the former. However, there is a 

renewed interest in improving system performance to deal with growing energy and pollution constraints. This 
paper describes a learning-based model predictive control (MPC) scheme. The MPC provides deterministic 
guarantees on robustness and safety, and the learning is used to identify richer models of the system to 
improve controller performance. Our scheme uses a linear model with bounds on its uncertainty to construct 

invariant sets which help to provide the guarantees, and it can be generalized to other classes of models and to 

> 

\^ — ' pseudo-spectral methods. This framework allows us to handle state and input constraints and optimize system 

—1. , performance with respect to a cost function. The learning occurs through the use of an oracle which returns 

^^ , the value and gradient of unmodeled dynamics at discrete points, and the oracle could be parametric or non- 

f— ^ ■ parametric statistical identification tools. Moreover, we show convergence of the control action determined 

by the MPC with oracle to the control action of an MPC that knows the unmodeled dynamics. To integrate 
learning where there is no prior knowledge about unmodeled dynamics with control, we construct a new non- 
k> , parametric estimator with properties that make it amenable to use with numerical optimization algorithms. 

S i Deterministic and probabilistic properties of this estimator are proven. 
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I. Introduction 

Ensuring stability of engineered systems is a central focus of control theory. A variety of classical 
and modem tools has been developed for this purpose, though they often have an inherent design 
trade-off. Safety and robustness can be guaranteed with approximate models, but optimal behavior 
(e.g., aggressive maneuvering or maneuvering to minimize energy consumption) requires accurate 
models. The trade-off involves the accuracy of models and controller design with the optimality of 
the system, and it has driven research in adaptive control [[Il-[l3l and learning -based control [Sll-Q. 

Adaptive control [[U, |l3l seeks to reduce conservatism by modifying controller parameters to match 
the closed-loop response to a reference model. This type of adaptive control is applicable to linear 
systems [[ll, [O and linearizable nonlinear systems [HI, [|2l where the nonlinearities are linear with 
respect to a set of unknown parameters. Adaptive control is successful in matching transient and 
stability properties of the closed-loop system to a desired response, and it has found application in 
many industrial domains including aerospace and robotics. These approaches do not consider state or 
input constraints, and the engineer cannot explicitly specify performance in terms of a cost function. 
There are also versions of adaptive control which use neural networks to identify nonlinear systems 
with specific structures [8|. 

On the other hand, learning-based control seeks to improve performance by using expert-guidance 
or iterated experiments. Iterative learning control [4J is designed for trajectory tracking, and it is 
most successful in situations where the system performs a repeated task. Expert-guided systems Q 
are gaining popularity because of their ability to replicate aggressive maneuvers. Unfortunately, it 
has been shown that the approaches which incorporate model learning cannot be used to guarantee 
safety flU, [fTOl . and so these systems rely on the expert-guided trajectories to guarantee safety. This 
is problematic because safety cannot be guaranteed for trajectories which deviate significantly far 
from those of the expert, except in special situations S. Methods based on randomized trees ^ 
have better safety properties, but they do not incorporate model learning. 

The motivation of this paper is to design a control scheme than can (a) handle state and input 
constraints, (b) optimize system performance with respect to a cost function, (c) use statistical 
identification tools to learn model uncertainties, and (d) provably converge. We are particularly 
interested in situations in which a parametrization of the uncertainty is not known, though our 
framework equally applies to the parametric case. The main challenge is combining (a) and (c): 
Learning methods converge in a stochastic sense, and this is not strong enough for the purpose of 
providing deterministic guarantees of safety. Showing (d) is also difficult because of the differences 
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between statistical and dynamical convergence. 

Model predictive control (MPC) explicitly considers (a) and (b), and so it is arguably the most 
natural framework. This paper develops an MPC scheme that incorporates learning. The main idea 
is that MPC only requires an oracle that returns information about the vector field at specific points, 
and not an analytical equation describing the vector field at all points. This insight is useful because 
it means that if we can construct an oracle which learns the model, then we can join learning and 
MPC. Nonparametric statistics [[T0ll - [[T2ll provide an established approach for building the oracle. 
The second idea is that maintaining safety requires a priori knowledge about the system. If we 
have a linear model of the system and bounds on its modeling error, then we can use existing MPC 
techniques to prove safety of an MPC scheme which uses the oracle. This is similar to |15J, though 
the MPC framework allows us to consider less restrictive model structures. 

A. Overview 

The paper is organized into two parts: The first describes a learning-based MPC with generic 
oracle and proves its deterministic safety and robustness, and the second defines parametric and 
nonparametric oracles and shows the convergence of the MPC scheme. Issues related to estimation 
and filtering are proved in the Appendix. We consider the control of a general, nonlinear system 
with noisy measurements. The nonlinear model is unknown, and we only have an approximate, 
linear model with bounds on its modeling error. 

The first section defines a learning-based MPC scheme that uses a generic oracle. An adaptive 
controller must be robust since it uses models with a stochastic or uncertain component, and so 
robustness of MPC is an important issue lfT3l . Robust MPC can be characterized into min-max lfT4l 
or invariant-set (i.e., tube-MPC) [fTSl - lfTTll approaches. We prefer tube-MPC because it requires less 
computation in exchange for some conservativeness. The linear dynamics and uncertainty are used 
to compute invariant sets which give a certificate of safety for our scheme. This safety guarantee is 
deterministic and valid for all oracles, even ones that are bad estimates of the unmodeled dynamics. 
We also show that our MPC scheme is robust. 

Our main insight is that performance and safety in MPC can be decoupled. We do this by making 
the MPC cost function dependent on the oracle dynamics to specify the performance of the system. 
On the other hand, safety is ensured by making the MPC constraints dependent on the linear dynamics 
and their uncertainty. As long as we can design a controller which provides safety for the system in the 
presence of the modeling errors, our learning-based MPC can be proven to be safe. Our decoupling 
approach is general, and identical learning-based MPC schemes can be developed for systems where 
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the appropriate invariant sets can be computed. Such sets can be computed for piecewise-affine 
systems lfT8l -[[2T ]| . and (nonlinear) continuous-time systems [[22 | - [|25l . It can also be applied to 
pseudo-spectral methods to add learning to them. 

In the second section, we discuss specific oracles and show convergence of the control law 
generated by our MPC scheme to that of an MPC that knows the true dynamics. There are two broad 
situations. The first is when the modeling uncertainties can be parametrized, and this is commonly 
the case in adaptive control [[I]|-[l3]|. The second is when uncertainties cannot be parametrized, either 
because of unknown phenomenon or budgetary and time constraints on engineers. This situation is 
interesting because it is when learning-based approaches are used. We use nonparametric statistical 
theory to define a new estimator which we show is well-behaved and convergent. There is a possibility 
of using neural networks as another learning method, but we do not consider that here. 

II. Preliminaries 

In this section, we define the notation, the model, and briefly summarize two estimation and 
filtering results needed for the paper. 

A. Notation 

1) Matrix notation: The j-th element of a vector v is denoted as Vj, though in some situations we 
use [v]j to prevent ambiguity. Furthermore, we use A' to denote the transpose of A. Marks above a 
variable are used to distinguish the state, output, and input of different models of the same system. 
The true system has state x, the estimated state is x, the linear model with disturbance has state 
X, and the model with oracle has state x. Similar marks are used for the corresponding inputs and 
outputs. 

2) Lyapunov-related functions: A function 7 : IR+ — > IR+ is type-/C if it is continuous, strictly 
increasing, and 7(0) = iH, ^\. A function /3 : M+ x M+ ^ M+ is type-/C£ if for each fixed 
t > 0, the function /3(-, t) is type-/C, and for each fixed s > 0, the function /3(s, ■) is decreasing and 
/3(s,t) ^0 as t^ cx) [27|. 

3) Set operations: The Minkowski sum ll28l of two sets U, V is defined d& U ®V = {u + v : u E 
U;v e V}, and their Pontryagin set difference [|28l, JlH is defined asUGV = {u : u®V C U}. The 
Pontryagin set difference is not symmetric, and so the order in which these operations are performed 
can make a difference. For the set U, the linear transformation of the set by matrix T is given by 
TU = {Tu:ue U}. 
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4) Asymptotic notation: For a deterministic sequence /^ and rate c^, the notation fk = 0{ck) 
means that there exists M > and A^ G N such that ||/fe|| < A//||cfc|| for all k > K. 

For a random variable fk, constant /, and rate c^; the notation |/fc — /| = Op{ck) means that given 
e > 0, there exists finite M and K such that PM/fc - /|/cfc > Mj < e, VA; > K. 

5) Epi-convergence definitions: We define a few terms related to the epi-convergence [|30ll . [ISTI of 
random functions. Let Epi /„(-, tu) = {{x, fi) : n > fn{x, to)} be the set of all points lying on or above 



the function fn{-,uj), denoted the epigraph of fn{-,uj). Its closure is Epi fn{-,ui). We denote the set 
of all compact subsets of W by C^. The a-neighborhood of X is UaX = {x : iniy^x \\x — y\\ < a}, 
and its closure is UaX. 

For two functions f,g : W ^W, we define Vi^,{f,g,X;io) = [Epr/(-,w) n (UiX x R)) \ 
U^ i Epi g{-, (jj) n (A" X M) j . This set contains each cluster point of the epi-graph of / with argument 
in U \X, but with distance at least e from the epi-graph of g restricted to X. 

The sequence {fn)nen is said to be a lower semicontinuous approximation in probability to /q at 
X (notation /„ ^-^^^ /o) if Ve > VAT G C^^^ : Hmn^^ ¥{w : Vi^,{fn, /o, A"; w) n AT ^ 0) = 0. 

B. Model 

For state vector x G W, control input u G M™, and output y eW^ the system dynamics are given 

by 

X = AcX + BcU + gc{x, u) 

(1) 

y = Cx 

where A^ B^ C are matrices of appropriate size and g^x, u) describes the unmodeled (possibly 
nonlinear) dynamics. We assume that the system has hard constraints on the states x E X and 
control inputs u G U, where X = {x eW : F.j.x < h^} and U = {u E M"* : F„u < /i„} are convex, 
compact poly topes that are described by matrices F^, Fu and vectors hx.hu- 

Model predictive control (MPC) provides piecewise constant inputs, and so we will also consider 
a sampled-data system version of the model. Define x\n] = x{nTu), where Tu is the sampling period 
and x{t) is the solution to ([T]) with input 

u{t) = u[n], Vt G [nT^, {n + 1)T,). (2) 
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This sampled-data model can be more concisely written as 

x\n + 1] = Ax\n] + Bu[n] + g{x[n], u[n]) 

(3) 
y[n\ = Cx[n], 

where A = e^"^*, B = J^ eMt-r) drBc, g{x[n],u[n]) = x[n + l] — {Ax[n] + Bu[n]). Note that g{x, u) 
is Lipschitz continuous with respect to (x, u) by the continuous dependence of ordinary differential 
equations (ODE's) with respect to initial conditions ||26l . 

We assume that the uncertainty of the discrete-time system g{x,u) is bounded and lies within a 
convex, compact polytope. Specifically, 



g{x,u)eW = {g:Fgg<hg} 



(4) 



for all (x,u) E {X,U), where Fg is a matrix and hg is a vector. This includes box-constraints 
||(7(a;,M)||oo < w- The intuition of this assumption is that the model represents a linear system with 
bounded uncertainty. 

The model in ([3]) is an exact discretization of the model in ([T]), and so they can be used inter- 
changeably as a representation of the true system. Depending on the context, one model is preferred 
to the other. The model in ([3]) is natural for the MPC setting, and it is the one used in the majority of 
the paper. The Appendix discusses estimation and filtering, and we use the continuous-time model © 
because our tools make use of nonparametric statistics, which are convergent in the continuous-time 
setting. 

C. Estimation and Filtering 

We summarize here the most important results from the Appendix, for establishing notation used 
in later sections and recognizing that the details are not important towards understanding the main 
contributions of this paper. 

Theorem [TT] proves that a filter cannot in general learn unmodeled dynamics while performing 
state-estimation, unless all states are measured. For instance, it applies when the unmodeled dynamics 
are of the form 7(x, u; X) + K, where i^ is a constant, non-zero vector and 7(0;, u; A) is a parametrized 
nonlinear function such that 7(x, u; Aq) = for some parameter value Aq. Note that it is possible 
to do both for systems with specific structure on the unmodeled dynamics. We use this result to 
motivate why we assume that all states are measured. 

State filtering is needed to deal with the case in which the measurements have noise, because such 
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noise leads to an errors-in-variables situation. This situation is problematic because it is known that 
standard estimates of unmodeled dynamics will be asymptotically incorrect [[32l . Control engineers 
have developed techniques, including the Kalman filter, to deal with this [|33l , ||34l . Our approach is 
to presmooth the data in time using a two-level sampling scheme, and this can be interpreted in an 
instrumental variables context ||33l , [|34ll . 

The other result of note is given in Lemma [6l which states that if measurement noise e[n] is 
bounded Ij < [e[n]]j < Uj, then we can estimate the state x[n] to ensure that it always lies in a fixed, 
bounded region away from the true state x[n]. In particular, we have that x[n] G x[n] © £, where 
£ = {e : Ij < [e]j < Uj} © (— {e : Ij < [e]j < Uj}). Note that this is a conservative result, and in 
practice the estimate x[n] will be closer to the true value x[n\ than the raw measurement. 

in. Learning-Based MPC 

In this section, we discuss several different models of the system. For notational clarity, we add 
marks above x,y,u to clearly delineate the state, output, and input of the different models. The true 
system ([3]) has state x, output y, and input u, and the estimated state is given by x. We denote the 
states for the system with an oracle as x, y, u, and its dynamics are given by 

x[n + 1] = Ax[n] + Bu[n] + On{x[n],u[n]), (5) 

where C„ is a time-varying oracle which returns an estimate of the unmodeled dynamics and A, B 
are taken from ([3]). 

An important case is when C„ is a disturbance d[n] G I^ for a bounded, convex polytope V. We 
model the unknown dynamics and measurement error as a bounded disturbance and use a linear model 
to construct sets such that the control generated by the nonlinear MPC with oracle is guaranteed to 
be safe. In this case, the state x, output y, and input u of the linear model have dynamics 

x[n + 1] = Ax[n] + Bu[n] + d[n\. (6) 



The terminal constraint set is typically used to guarantee both feasibility and convergence [|T3ll . 
[|35l : however, we advocate decoupling these two concepts. This is an important distinction to make, 
because there are many MPC schemes for which convergence is difficult to show, yet feasibility is 
much simpler to prove. From this viewpoint, feasibility is related to safety properties, and convergence 
is related to performance properties. 
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A. Feasible Set Point Tracking 

Because MPC results in a nonlinear controller, it has properties that a linear controller does 
not. Nevertheless, it is useful to talk about the feasible set of points which a linear controller can 
exactly track. This identification can be used to increase the feasible set of linear MPC [[36ll . to give 
convergence guarantees for some of the points trackable by the linear MPC (361, and to robustify 
linear MPC against bounded disturbances ifTSl . ifTTl . 

Consider the problem of tracking a linear system to output y^. The corresponding steady- state 
Xs,Us was given in [fTTl . Il36l . and it is the solution to 

Us = 0. (7) 

U/ 

These solutions can be parametrized, and this is useful because it can be used to reduce the number 
of free variables in the MPC optimization ifTTl . Il36l . 

Lemma 1 (Limon, et al. IfTTl , Il36l ). If {A,B) is stabilizable and rank{C) = q, then the set of all 
solutions to ((Zl) is given by Xg = A9, Ug = "^9, y^ = HO, where 9 E M™ and A, \&, 11 are full 
column-rank matrices with suitable dimensions. 

Proof: Because {A, B) is stabilizable, the PBH test implies that rank((y4 — I Bj) = n. Next, 
note that the block structure of the matrix in (|7]) implies that the last q columns are linearly 
independent from the other columns of the matrix. The rank of the matrix is n+q, and the rank-nullity 
theorem implies that the null- space has dimension m. 

Construct the matrix (a' ^' IT') G ]]j("+m+p)x™ by assigning to its columns a set of basis 
vectors for the null-space. ■ 

The control law 

u[n] = K{x[n] - Xs) + Us = Kx[n] + (^ - KA)9 (8) 

drives the state of the unconstrained, linear system to Xs, and this is formalized by the following 
corollary. 

Corollary 1 (Limon, et al. [fTTll , [[36ll ). If {A + BK) is Schur stable (i.e., all eigenvalues have 
magnitude strictly less than one), then the control input defined in ((S]) steers the system © to 
steady-state Xg = A9 with steady-input of Ug = "^9. 
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The previous result applies to unconstrained systems which no disturbances. The system in our 
setup has input and state constraints, and it has bounded disturbances on the dynamics. We must do 
additional work to characterize the points which can be feasibly (i.e., not violate constraints) tracked 
by a linear control. 

As in IfTTll , [[36ll , define the augmented system as 

^x[n + l]\ (a + BK B{m-KK)\ (x[n]\ (d[n]\ 

where the matrix in this equation is labeled Aa- This couples the dynamics of the linear system with 
trivial dynamics on the parametrization 6, and the reason for doing this is that it allows us to draw 
upon existing machinery for computing invariant sets ll29ll , [|37ll . 

The maximal output admissible disturbance invariant set fi C Af x M™ was defined in ||29l . It is 
a set of points such that any trajectory of the augmented system with initial condition chosen from 
this set remains within the set for any sequence of bounded disturbance, while satisfying constraints 
on the state and input. The component of the set is a parametrization of which points can feasibly 
be tracked using a linear controller. These properties are formalized as 

• disturbance invariance: AaVt © (2) x {0}) C Vt; 

. constraint satisfaction: Vl <Z {(x,9) -.x e X; M e X; Kx + {^ - KA)9 eU;^6 e U}. 

The set Vt has an infinite number of constraints in general, though arbitrarily good approxima- 
tions can be computed in a finite number of steps ll29l . Il36l . These approximations maintain both 
disturbance invariance and constraint satisfaction, and these are the properties which are used in the 
proofs for our MPC scheme. So even though the results are written for Vt, they equally hold true for 
appropriately computed approximations [|29l , ||36l . 

B. Robust MPC Scheme 

A linear MPC scheme for tracking was presented in [|36l . and it often enlarges the feasible domain 
of the MPC. A robust version was described in ifTTl . and it works by allowing the initial point of the 
MPC to vary within a tube around the true point ifTSl . lfT6l . This approach would be too conservative 
when used in conjunction with our oracle, and so we describe a related MPC scheme. We use the 
idea of disturbance sets and input-to- state stability to make the MPC robust. 

The idea of tube-MPC [fT5l - [fT7ll is that feedback K is used to control how large the effect of the 
disturbance can grow. Given a nominal trajectory, the true trajectory is guaranteed to lie within a 
tube that surrounds the nominal trajectory. The constraints X are shrunk by the width of the tube 
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TZi, so that if the nominal trajectory lies in A" TZi, then the true trajectory lies in X. The terminal 
constraint set is chosen to be the maximal output admissible disturbance invariant set fi in order to 
ensure that we can stay within it indefinitely. 

We consider a general MPC scheme and show that it has robust feasibility and convergence 
properties. The MPC is defined by the following optimization problem 

min ipmi0,3;[m], . . . ,x[m + A^],'u[m], . . . ,u[m + A^ — 1]) (10) 

ci],e 

s.t. a;[m] = x[m], a;[m] = x[m] (11) 

x[m + z] = Ax[m + z — 1] + Bu[m + i — 1] 

+ Om{x[m + i-l],u[m + i-l]) (12) 

x[m + i] = Axlm + i — 1] + Bu[in + i — 1] 
u[m + i — 1] = Kx[m + i — 1] + c[m + i — 1] 

x[m + i]e X elZi, u[m + i-l]eUe KUi (13) 

x[m + N]eXNeTlN 

{x[m + N],e) en 

for alH G / in the constraints, where / = {1, . . . , A^}; K is the feedback gain used to compute fi; 
TZi = 0*lo(^ + BKyV; Xjy = {x : 39 s.t. {x,9) E i7}; ^^ are functions which are continuous 
in their arguments; Om is an oracle. The constraints in this scheme are applied to the linear model 
Q and not to the oracle model ©, and they are taken from [fTSl . However, the same control u[-] is 
used in both models. Furthermore, define the value function l^(a;[m]) of this optimization problem 
to be the value of the objective [10] at its minimum. 

Note that the value function Vm, the cost function ipm, and the oracle Om are time-varying because 
they are functions of m. It is important that the oracle be allowed to be time-varying because the 
oracle is updated as time advances and more data is gathered. The oracle is generally a model 
computed by a learning or regression method, and it becomes more accurate as additional data is 
used to fit this model. 

Let A^m be a feasible point for the MPC scheme (flOl) with initial state x[m], and denote the 
minimizing point of (flOl) as A^^^. The states and inputs predicted by the linear model Q for point 
JUm are denoted x[m + i; Aim] and Um["m + i] M.m\, for i E I. This MPC scheme is endowed with 
robust feasibility properties. 
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Theorem 1. Assume g{x[n],u[n]) + e[n + 1] — Ae[n] G V and E S, where e[n],e[n + 1] E £ 
are state-estimation errors. If Aim = {c[fn], ■ ■ ■ , c[?7i + A^ — 1], 6m} is feasible for the general MPC 
scheme diOl ) with x[m\, then we have 

a) Robust feasibility: A^m+i = {c[m+l], . . . , c[m+N—l\, Kxmi^^+N] Jv{.m+i\ + {^ — K K)6m+i, Om+i}, 
where 6m+i is such that (x[m + 1 + A^ — 1; A^^+i], ^m+i) £ ^, is feasible for x[m + 1]; 

b) Robust constraint satisfaction: x[m + 1], x[m + 1] G A". 

Proof: The proof follows a similar line of reasoning as Lemma 7 of [fT5l . Let d[m + l + i;Aim] = 

{A+BKy{e[m+l]+g{x[m],u[m])-Ae[m]),andnotethatd[m+l+i;Mm] e {A+BKy+^V. Some 
algebra gives the predicted states and inputs for z = 0, . . . , A^ — 1 as x[m +l+i; Mm+i] = Xmim + 

1 + i; Mm] + d[m + 1 + i; Mm] and u[m + l + i;Mm+i] = Um[m + l + i;Mm] + Kd[m + l + i;Mm]- 
Because Mm is feasible, this means by definition that Xm["m + 1 + i; Mm\ G A:" 7?.j+i for i = 
0, . . . , A^-2. Using (IIILBl) . we have that x[m+l+i; Mm+i] e Xeini®{A+BKy+^)®{A+BKy+\ 
By properties of ©, e [IH, [|2l, it follows that x[m + l + i; Mm+i] e X QTZi for i = 0, . . . ,N -2. 
Similar reasoning gives that u[m + 1 + i; Mm+i] G W KTZi for i = 0, . . . ,N — 2. 

The same argument shows that x[m + 1 + A^ — 1; Mm+i] G X^ TZn-i, and there exists 6m+i 
such that (x[m + 1 + A^ — l]Mm+i],dm+i) G i7 by definition. The constraint satisfaction property 
of Vl and the affine transformation property of [|28l . [|29ll imply that n[m + 1 + A^; A^^+i] = 
Kxm[m + A^; A^m+i] + (^ - KA)9m+i G W KTZn-i ^B- By definition A^at C A", and so 
x[m + 1 + A^ — 1; A^m+i] E X Q 1Zn-i- From the disturbance invariance property of Vt, choosing 
the control u[ni + l + N] Mm+i] gives x[m + 1 + A^ + 1; Mm+i] ^ X^ TZ^- This completes the 
proof for part (a). 

Similar arithmetic gives the true, next state x[m + 1; A^m] = x[m + 1; Mm\ +w[m] where w[m] = 
g{x[m], u[m]) — Ae[m] G V. Since Mm is a feasible point, it holds that x[m + 1; Mm] E X QV. 
This implies that x[m + 1; A^m] = x[m + 1; A^m] + w[m] G (A" "D) "D C A' by the properties 
of 0, [|28l, [|29l. Noting the effect of estimation error x[m + l] = x[m + l] Mm] + e[m + 1] with 
w[m] + e[m + 1] E V, a similar argument gives x[m + 1] E X. Consequently, part (b) follows. ■ 

Remark. The result applies irrespective of ipm, Om, and so it applies to the case where ^m, Om are 
time- varying. This allows changing the set point of the MPC [36]. 

We have shown robust feasibility of the general MPC scheme (flOl) without saying anything about 
its convergence. Showing convergence requires placing additional conditions on the system, and it 
is the subject of the next subsection. However, we can show robust stability properties under fairly 
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general conditions. We focus on the following notion. 

Definition 1 (Jiang and Wang [|27ll ). A system is input-to- state stable (ISS) about xq if there exists 
a type-/C£ function /3 and a type-/C function 7, such that for each initial condition x[0] and all 
disturbances d[n], it holds that \\x[k] — xo\\ < /3(||x[0] — Xo||, k) + 7(-D), where D = max„ ||(i[n]||. 

Remark. This intuitively means that if a controller for the unperturbed system converges to a^o, then 
the same controller applied to a perturbed system converges to a point whose distance from xq is an 
increasing function of the perturbation. 

In our case, the control is given by a minimizing point A^^ to the MPC (fTOl) . It is explicitly 

u[m; Ml,] = Kx[m] + c[m; M*^]. (14) 

The closed-loop system with this control is ISS about the equilibrium xq. To prove this, we first 
show that the value function Kn(a;[m]) of the MPC (fTOl) has useful properties. 

Lemma 2. Let Xp be the feasible region of the MPC diOl ). Ifi'm, Om are continuous, then Vm{x[m]) 
is continuous on int{Xp). 

Proof: We define a cost function ^^ and constraint function cf) such that the MPC (39) can be 
rewritten as 

min '?/'m (6*, 5;[m], c[m], ..., c[m + A^ — l])s.t. (c[-], 6') G 0(x[m]). (15) 

c[-],e 

The proof proceeds by showing that both the objective il)m and constraint are continuous. Under 
such continuity, we get continuity of the value function by the Berge maximum theorem [|38l . 

By abuse of notation, we consider a;[m + i] in (fT2l) to be a function of x\m + i — \\ and u[m + z — 1]. 
The function ^^ is formed by composing (flOl) . (fTTI) . (fT2l) . and (fT3l) . Since the composition of 
continuous functions is continuous [|39ll , it follows that ^/'m is continuous. 

Next, noting that x[m + i\ = {A + BKYx[m\ + Xlj=o(^ + BKyBc[m + j] and M[m + i - 1] = 
K{A + BKy~'^x[m]+c[m + i-l] + Y!j'=QK{A + BKy Bc[m+ j], we can rewrite the constraints ([H]) 
as a set-valued mapping (/)(x[m]) = {c[-], ^ G M : Mix[m] + M2Ca G Ua] {M^x[m] + M4Ca, ^) G fii}, 
where Ca = (c[m]' ... c[m + N — I]') ■ Following results in [|35l and references therein, (p is 
convex and Lipschitz continuous in the interior of its domain. ■ 

Remark. This result is somewhat surprising because a general, nonlinear MPC problem has dis- 
continuous value functions. The reason that we have a continuous value function is that our active 
constraints are linear equality constraints or convex, bounded poly topes. 
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Having established that the value function ^^(^[m]) is continuous, we now specifically consider 
its dynamical properties. Vm is a Lyapunov function if the following conditions are met 

a) Lower bound: ijjm > ai(||x[m] — xo||), where ai is a type-/C function; 

b) Unique minimum: Vm{xo) = and Vm{x) > 0,Vx 7^ xq; 

c) Continuity: V^(x[m]) is continuous on mt{Xp) and G int(AV); 

d) Descent: Vm+i{x[m + I]) — Vm{x[m]) < for x[m] ^ xq. 

Under these conditions, there also exists a type-/C function ^2 such that ai( II x[m]—xo II) < Kra(a^["^]) < 

a2(||£["^] -a^oil) [ESI- 

We can finally establish ISS of the closed-loop system with feedback (fT4l) . 

Theorem 2 (Grimm, et al. [|35l ). The closed-loop system with feedback rf7?l) is ISS, under the 
assumptions of Lemma |2] and the Lyapunov assumptions on the value function. 

Remark. The result in [|40l is similar, but it requires Lipschitz continuity which is a stronger condition. 

C. Provable Tracking 

The previous subsection contained abstract conditions to guarantee convergence. We can provide 
simpler conditions in more specific cases, and that is the focus of this subsection. In particular, we 
focus on tracking to the points Xg E Xs, where Xg = {A9 : 3x s.t. {x,6) E fi}. As discussed in 
Sect. IIII-A[ there is a corresponding Ug and y^ for each such point. We do this tracking in the linear 
model ^ with disturbances of unmodeled dynamics and measurement noise. 

Consider a specific instance of the MPC (flOl) in which Om = 0. This is just a linear MPC, albeit 
with redundant constraints because x = x. Let P, Q, R, T be positive definite matrices and define 
the cost function to be 

V'm = \\x[m + i]- A9\\l + \\xg - A9fj, + J^t'o' H^ + ^] " ^^IIq + ll^t^ + ^] - ^^lll- (16) 
This is the cost function used in ll36l. 



Theorem 3 (Limon, et al. [36]). If Q, R, T are positive definite matrices, {A + BK) is Schur stable, 
and P solves the discrete-time Lyapunov equation 

{A + BK)'P{A + BK)-P = -{Q + K'RK)- (17) 

then the assumptions of Lemma |2] and the Lyapunov conditions of the value function are met. 
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Proposition 1. Under the assumptions of Theorem \^ the closed-loop system with with feedback ([74 
generated by the linear model is ISS. 



Remark. The significance of this construction and result is that the linear MPC for tracking 11361 has 
inherent robustness properties. This is in contrast to ifTTl in which the formulation is robustified by 
varying the initial condition of the MPC and considering the tube of trajectories formed. 

Now we define a nonlinear MPC which uses the oracle Om, and it is constructed in a manner 
to ensure that it is ISS. The idea is to use the value function for the linear MPC y(a;[m]) which 
tracks Xg as the cost for the nonlinear MPC, and note that we drop the m subscript because it is 
time-invariant. More specifically, we choose 

iP^ = Vixlm + 1]) = V{Ax[m] + Bu[m] + 0„(x[m], u[m])) (18) 

as the cost function of the nonlinear MPC with oracle. The intuitive idea for this choice of the 
cost function is that the value function of the linear MPC effectively collapses the cost of an A^ 
step problem into the cost of a step problem, and so there may be computational savings for the 
nonlinear MPC with oracle when using cost (fTSi) at the sacrifice of some performance. The following 
result shows that this choice of cost function is robust. 

Theorem 4. If the oracle is bounded \\Om{x,u)\\ < w and the assumptions of Theorem\3}hold, then 
the closed-loop system with feedback 47?1) generated by the oracle MPC is ISS. 



Proof: To differentiate between the control generated by the linear and oracle MPC, let A^J^ 
and A^™ be minimizers for the linear and oracle MPC respectively. We will have three different 
state values at time m + 1. The first x[m + 1; Ai^] is oracle model with nonlinear MPC control, the 
second x['m + 1; A^^] is oracle model with linear MPC control, and the third x[m + 1; A^J!^] is linear 
model with linear MPC control. This is summarized as x[m + 1; Ai'^] = Ax[m] + Bu[m] Al™] + 
Om{x['m],u[m;M';^]), x[m + l;M'^] = Ax[m] +Bu[m;M^] + Cm(£[m],M[m; Al;^]), and x[m + 
1; M*J,] = Ax[m] + Bu[m; M*S 

Because V is the value function of a convex optimization problem, it is known to be convex and 
Lipschitz continuous in the interior of its domain [[T4|. Il38l . Let L be its Lipschitz constant, and 
exploiting this property gives 

\V{x[m+l-M*^]-V{x[m+l-M*^])\ < L\\i[m+1; M*^]-x[m+l; M*^]\\ = L\\Om{£H,u[m;MZX 

(19) 
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The proof finishes by noting that A^^ is suboptimal for the nonlinear MPC. Mathematically, this 
is represented as V{x[m + 1; M'^]) < V{x[m + 1; A^^]). Combining this with (fT9l) gives 

V{x[m + 1; M™]) - V{x[m]) < V(x[m + 1; M*J,]) - V{x[m]) + L||0^(:r[m], u[m; M*^])\\. (20) 

This is ISS stable by ^ and noting that Theorem H gives V{x[m + 1; M*J,]) - V{x[m]) < for 
x[m] 7^ Xg. ■ 

Remark. This result applies to all bounded oracles. It is a mixed result because while it does not 
say whether the oracle improves the MPC performance, it says that the oracle does not significantly 
worsen the performance. Additional conditions on the oracle are required in order to be able to show 
that it improves performance. 

IV. The Oracle 

In the previous discussions, the oracle has been an abstract object. If it has certain features, such 
as boundedness or probabilistic convergence, then we will show that the corresponding MPC with 
oracle is ISS or epi-convergent. In this section, we talk about the oracle in concrete terms. We give 
two examples of oracles, and we show that these oracles have the desired properties. 

The first oracle is a parametric (possibly nonlinear) function, and it can be either linear or nonlinear 
with respect to the parameters. The second oracle is a modified version of the Nadaraya- Watson (NW) 
estimator, a 0-th order local polynomial regression (LPR) filter. Our modification of the NW estimator 
ensures that it is sufficiently differentiable. This is important for practical implementations in digital 
hardware because many optimization algorithms have difficulty in dealing with functions which are 
non-differentiable. 

We begin by introducing the concept of epi-convergence. 

A. Stochastic Epi-convergence 

The oracle Om is time-varying, and so at each point in time m we are solving a different MPC 
problem. That is, we are actually solving a sequence of MPC problems. A natural question to ask 
is whether the control policy generated by this sequence of MPC with the oracle converges to the 
MPC which uses the true system model. If this convergence occurs, then it shows that the MPC with 
oracle is a reasonable procedure. 

Showing convergence of the minimizer of a sequence of optimization problems is not straightfor- 
ward. One approach is to prove that the objective and constraints of the optimization problem converge 
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in the specific way. Regular convergence is not enough, but a related notion of epi-convergence is 
enough (though strictly speaking, we only need lower semicontinuity in probability). A detailed 
discussion of epi-convergence is beyond the scope of this paper, but the interested reader can refer 
to (301, USD. 

For our purposes, it is enough to cite the relevant results. If the cost function -ipm composed with 
the oracle Om{x,u) converges in the appropriate manner to ipm composed with the true dynamics 
g{x,u), then we get convergence of the minimizers of the MPC with oracle to those of the MPC 
with true model. 

Theorem 5 (Theorem 4.3 [31]). Let ip^n cind (j) be as defined in Lemma^ and define 'ipQ to be the 
composition of dTOl) with both HTD and x[n + l] {x [n] ,u[n]) = Ax [n] + Bu [n] +g{x[n],u [n]). We also 
need to define the set Xp = {x[m] : (j){x[m]) ^ 0} that represents the points for which the MPC ( fTOl ) 
has a feasible solution. If ip^ ^ V'o for all x [m] G Xp, then the set of minimizers converges 

4>{x[m]) 

argmin{'?/'m|(c[-], 6*) G 0(x[m])} — ;■ argmin{-?/'o : (c[-],6') G 0(x[m])}. (21) 



Remark. Our statement of the theorem is simpler than that in OTII because of two reasons. First, ipo is 
continuous by assumptions on the cost function and dynamics of the system. Second, our constraint 
set is fixed because it is independent of the oracle. 

B. Parametric Oracle 

We begin by discussing the case in which the oracle is linear with respect to a set of unknown 
parameters. This structure is very commonly found in the adaptive control literature |[l|-|l3l. Let 
Aj G 7i be a set of parameters. Then the oracle is 0{x,u]6) = ^^ AjXi(a;,M), where Xii^^u) are 
functions that are allowed to be nonlinear. Obviously, a special case of this is the linear model 
0{x, u] 6) = A\x + B\u, where A\, B\ are matrices whose entries are parameters. 

The case in which the oracle is nonlinear with respect to a set of unknown parameters is similar. 
Again let Aj G 71 be a set of parameters. The oracle is given by 0{x, u; A) = x{^^ u; A), where x is 
nonlinear in all of its arguments. 

Even though the linear case is a subset of the nonlinear case, it is important to distinguish the 
two because they require different regularity conditions for statistical convergence [|4T]| - [|43l . For the 
linear case, the situation 71 = M is allowed. In the nonlinear case, 71 needs to be a bounded, convex 
poly tope. 
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The parameters Aj are initially unknown, and they are estimated using the trajectory data. It is 
customary to use a least-squares cost function to estimate the parameters Aj as follows 

A'" = arg min ELil^fc - ^(^fc! ^))V^ (22) 

where Xi = [x[i]' u[i]') and Yi = x[i + 1] — {Ax[i] + Bu[i]). The corresponding oracle is given 
as Om{x,u; A™), and it has convergent behavior under appropriate assumptions. 

Theorem 6. Suppose there exists A° G 71 such that g{x,u) = 0{x,u; A°). For the nonlinear case, 
assume that the estimate A™ is continuous with respect to Xi, Yi. If regularity conditions for the linear 
(nonlinear) case given in ^43\l ( 1^41^. S2l) <^re satisfied, then ipm ^ "ipo for all x[m] G Xp. 

0(£[m]) 

Proof: The proof simply requires application of existing theorems. If 6'" converges in probability 
to 9^, then the result follows by Proposition 2.1 of [44]. All we have to show is that the least squares 
estimate is consistent, but it does not come for free because we have an errors-in-variables situation. 
Specifically, we have to presmooth the measurements to get estimates Xj, Y^ of the true values Xi,Yi. 

The linear case has a well known explicit solution A™ = {X'X)~^X'Y p43|, where X,Y are 
matrices formed by stacking Xj, Y^. This solution is continuous in the X^, Yi under the regularity 
conditions of having a nonsingular sample covariance matrix [|43l . The continuous mapping theorem, 
in conjunction with Theorem [12] and consistency of the linear case ["431, leads to convergence in 
probability of A" to A. 

In the nonlinear case, the continuity assumption means that we can directly use the continuous 
mapping theory with the consistency of the nonlinear case [|4TI . [|42l . Combining these results gives 
convergence in probability of A™ to A. ■ 

Remark. There are two important notes. First, we only reference the assumptions because the regular- 
ity conditions are beyond the scope of the paper. The interested reader can refer to [[Tl, Pni - [|43l for 
details, though from an applications standpoint, it is uncommon to violate the regularity conditions. 
More importantly, our MPC scheme provides safety even in the scenario where these conditions are 
violated and the oracle misbehaves. Secondly, in general A"^ is not continuous with respect to Xi,Yi. 
Proving the result for this case is a difficult open problem, and a possible direction is the use of the 
epi-convergence results in [|45l , [|46ll . 
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C. Nonparametric Oracle 

In this subsection, we modify the NW estimator to define the oracle which returns the value 
g{x, u) and gradient Dg(x, u) of the unmodeled dynamics at state x and input u. The NW estimator 
is defined as the ratio between two quantities, and it can be thought of as the interpolation of non- 
uniformly sampled data points by a suitably normalized convolution of a kernel with the data points 
represented as Dirac delta functions. It runs into problems in practice because the denominator can 
be zero when there is an insufficient number of measurements. This is not a problem theoretically 
because typical convergence results [fTTTl . [fT2l are asymptotic in nature. To get good finite sample 
behavior when there are few measurements, we define a regularized NW estimator. This modification 
has good statistical and deterministic properties. 

To state the oracle, we use the following notation 

Xi= [x[{\' n[z]')' (23) 

Yi = x[i + l]- {Ax\i\ + Bu[i]) (24) 

S, = (e-X,)'(e-X,)//i2, (25) 

where Xj G W~^"^ and Yi G MP are data and ^ = ix' u'] are free variables. The function K : 
M — 7> M+ is called a kernel function. It has finite support K{u) = for \u\ > 1, even symmetry 
K{u) = K{—u), positive values K(u) > for \u\ < 1, and differentiability. With these definitions, 
we define the regularized NW estimator as 

g{x, u- X„ f,) = ,^'3:^^^:\ , (26) 



where A G M+. If A = 0, then (1261) is simply the NW estimator. The purpose of the A term is to 
regularize the problem and ensure differentiability. 

There are two alternative characterizations of (l26l) . The first is as the unique minimizer of the 
following parametrized, strictly convex optimization problem 

g{x,u]Xi,Yi) = aigmin L{x,u, Xi,Yi,'y) 

(27) 

L{x, u, X„ F„ 7) = Ei KiE,)i% - 7)2 + X^'. 

Viewed in this way, the A term represents a Tikhonov regularization P7l , ||48l . The second charac- 
terization is as the mean with weights {A, K{Ei), . . . , K(E„)} for points {0, Fi, . . . , F„}, and it is 
useful for solving the second part of the following theorem. 
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1) Deterministic Properties: 

Theorem l.IfOe£,Oe W, g{x,u) e W, K{-) is dijferentiable, K{-) > 0, and A > 0, then 

a) g{x,u;Xi,Yi) is dijferentiable with respect to x,u,Xi,Yi; 

b) g{x, u; Xi, Yi) eW®£® -AS. 



dL, 



Proof: The estimate g(x,u;Xi,Yi) can be represented implicitly as the value of 7 that solves 
X, u, Xi, Yi, 7) = 0, where L(-) is from (|271 ). Under the assumptions, the hypothesis of the implicit 
function theorem is satisfied because A + J2i ^C^'i) > 0- The result in part (a) directly follows from 
the implicit function theorem. 

Part (b) is shown by noting that the assumptions and Lemma [6] imply that 0, Fj G W (B£ (B —AS. 
If the weights of a weighted mean are positive and have a nonzero sum, then the weighted mean can 
be written as a convex combination of points. This is exactly our situation, and so the result follow 
from the weighted mean characterization of the estimator (l26l) . ■ 

Remark. There are two distinct remarks. First, this is a deterministic result for a finite number of 
samples. We can compute the gradient using standard calculus, and its jA;-th component is given 
by (l28l) for fixed Xi, Yi. Second, there are two ways to estimate the gradient of a function. We 
can estimate the function and compute the gradient of the estimate, or we can directly estimate the 
gradient. It is known that the former generally performs less well than the latter [49J, and that local 
linear regression techniques [|T2ll do well in computing the latter. However, because we are interested 
in numerical computations, it is the former that is needed in our case. This is a subtle distinction. 

.. /., _ {J:^{y^h ■ dKjE,) • H. ■ [e - X.],}{A + E. KjEi)} - {EMKi^^)}{J:^ dKjE,) ■ E, ■ [^ - X,],} 

(28) 

2) Statistical Properties: Theorem |7] shows that the regularized NW estimator (|26l) has good 
deterministic properties. Showing that it has good statistical properties requires additional work. We 
begin by proving a uniform version of a theorem that is called either the continuous mapping theorem 
ll50l or Slutsky's theorem [51 J, depending on the author. 



Lemma 3. Given random variables {Vk}, V eV such that \\Vk — V\\ = Opijk); if L{x, t>) : A* x V — )■ 
M is a continuous function and X , V are compact sets, then sup^. \L{x, Vk) — L{x, V)\ = Op{rk). 

Proof: The Heine-Cantor theorem (Theorem 4.19 in ||39l ) gives uniform continuity of L{x,v) 
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on A* X V, and this implies that for all x, || V^ — l^|| > 6 > whenever \L{x, Vk) — L{x, V)\ > e > 0. 
Proceeding analogously to [i50il . we have 

P(sup \L{x, Vk) - L(x, l^) I > e) = ¥{3x : |L(x, \4) - L{x, V)\ > e) <¥{\\Vk -V\\ > 6). (29) 

X 

The result is immediate. ■ 

We can now show the first convergence result for (|26l) . The intuition of why this result is true is 
that though noise in Yi and Xi is correlated, our presmoothing makes this correlation asymptotically 
insignificant. This result can be interpreted in an instrumental variables context fl33l . [[34l . 



Corollary 2. // \\Xi - Xi\\ = Op{rk), then sup^._„ \\g{x,u]Xi,Yi) - g{x,u,Xi,Yi)\\ = Op{rk). 

Proof: Define a random variable Vk = ix[ ... X'^ Y( ... YJ^) , and let V be the corre- 
sponding limiting vector. Equation (|24l) and the corollary's assumption imply that ||yj — Fj|| = Op{rk), 
and so \\Vk — V\\= Op{nrk). 

Now consider the functions r] = J2i YiK{Ei)/n, 5 = J2i K{Ei)/n, and p = r]/{\/n + 5). Applying 
Lemma |3] gives, sup^.„ \\r]{^] Vk) - r/(^; V)\\ = Op{rk) and sup^^„ \\6{^; Vk) - 6{^; V)\\ = Op{rk). 
Another application of Lemma [3] gives sup^.^ i|p(^; Vk) — pi^', V)\\ = Op{rk). The result follows by 
noting that g = p. ■ 

Remark. The variance of the NW estimator in its typical setup is known to uniformly converge at a 
rate no faster than l/n'^/^^"'"'^-* [12]. Our result gives a nonstandard rate of convergence r^, because 
we have a time- series setup with presmoothing. 

Convergence of an estimator is often studied by decomposing the estimation error into a bias and 
variance term. For proving convergence of our regularized NW estimator, we have to be careful in 
defining the probabilistic framework before we can decompose the error into two terms. The Xi 
values are not independent variables drawn from some probability distribution. They are exactly 
the states of a deterministic system, as it evolves in time. In fact, if the control inputs u[k] are 
(deterministically or statistically) known for each point in time, then X^ and Xj are dependent for 
all values where i > j. 

The standard approach to converting a time-series into a stochastic setting is to assume some form 
of ergodicity or mixing. Conditions to ensure this for a general, nonlinear system are exceedingly 
difficult, and so we consider a related notion. Let Bh{x) = {y : \\x — y\\ < /i} be a ball centered at 
x with radius h. A finite sample cover of A' is a set Sh = \JiBh/2{Xi) that satisfies X C Sh- The 
intuition is that {X,} sample X with average, inter-sample distance less than h/2. 
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Lemma 4. Let g{x,u) be Lipschitz with constant L. If Sh is a finite sample cover of Z C X xU, 
then sup(^,u)g2 Woi^j'^) ~ 9{x,u;Xi,Yi)\\ < fiMg + (1 — fi)Lh, where jj, = A/(A + 7^(1/2)) and 
Mg = max ||a;|| : x E X. 

Proof: Define / = {i : Hj < 1}, where Sj is from (|25l) . and note that K{'E.j) = for j ^ /. 
The regularized NW estimator is g{x, u; Xi, Yi) = wq ■ + J2iei ^« ' ^«' where wq, Wi > 0, Wi = 
K{Ei)/(\ + J2j ^C^j))^ ^^'^ ^0 + J2i^i — 1- The finite sample cover property of Sh implies that 
'Zj^i-j) > ^(1/2). Noting Wo < A/(A + K{l/2)) and Yi = g{Xi), the result follows from the 
triangle inequality. ■ 

Remark. The result shows that the regularized NW estimator in our setup has bias 0{\ + h), where 
A = 0{h). This matches the bias of the NW estimator 0(h) in a standard setup at both interior and 
boundary points [fTTH. [IT2]l. 

Theorem 8. Assuming Sh„ is a finite sample cover of Z C X x U, for some sequence hn — )■ 0; 
A = 0{hn); and k is a sequence such that T^/k — ?■ (see Theorem [72]) .• then the regularized NW 
estimator is uniformly consistent on Z and converges at rate 

sup ||^(x,M)-^(a;,n;X„F,)|| =0(A + /i„) + Op(A;-(''+i)/(2^+3)). (30) 



Proof: Using the triangle inequality, the left-hand side of (|30l) is bounded by 

sup \g{x,u\Xi,Y^-g{x,u\Xi,Y^\^ sup \g{x,u) - g{x,u\Xi,Y^\ (31) 

This first term is controlled by Corollary |2] and Theorem [121 and the second is governed by Lemma 
g] ■ 

Remark. Ideally, we would like Z = X x U, but this does not always happen. It requires that the 
trajectory of the system sufficiently explores the space in a manner formalized by the definition of 
finite sample cover. A set Z which meets the assumptions of Theorem [8] always exists, and this can 
be shown by construction: Given any n > 0,\et Z = U^^^Xj. A better set Z is defined as the limit 
of a convergent subsequence of Xi, and its advantage is that the Xi visit a neighborhood of the limit 
infinitely often. Such a limit is guaranteed to exist by the Bolzano-Weierstrass theorem. These two 
constructions mean that there is always some set on which the nonlinear dynamics g{x,u) can be 
learned, and this set corresponds to points which the trajectory visits. 

We need the following theorem in order to show epi-convergence of ipm as defined in Lemma [2] 
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and using the regularized NW estimator (|26l ) as the oracle. 

Theorem 9. Assume that we have a sequence of functions Vk{x), Wk{x) which converge in probability 

to V{x), W{x) as sn^i^^x^ \Vk{x) —V{x)\ = Op{rk) and sup^g^i^^ |W^fc(a;) — Vr(a;)| = Op{sk), and let 

^v '^ {x E Xy : V{x) G X^}- If Xy, X^ C M" are closed and compact sets; dX^j is the boundary 

of Xw,' W is Lipschitz continuous with constant L^; A*^ C X^; 5 > 0, where 5 = ini x(^x* \\x — y\\; 

yedXm 
then sup^g;^j \Wkiyk{x)) - Wiy{x))\ = Op{ck), where Ck = max{rfc, Sfc}. 

Proof: Applying the triangle inequality gives 

P(sup \Wk{Vk{x)) - W{V{x))\/ck > e) < P(sup \Wk{Vkix)) - W{Vk{x))\/ck > e) + 

x&X* x£X* 

P(sup \W{Vk{x))-W{V{x))\/ck > e). (32) 

x£X* 

Considering the first term on the right in terms of the conditional probabilities of Vk{x) lying or not 
lying in X^, it can be bounded as 

P(sup \WkiVkix))-WiVkix))\/ck > e) < P( sup \Wkix)-Wix)\/ck > e)+P(3x G X: : Vkix) i X^ 

(33) 
Yet we have that P(3x G X* : Vk{x) ^ X^) < Fisn^^^x; \yk{x) - V{x)\ > 5), and so the limit 
of (l33l) by assumption is lim P(sup^g;^-. \Wk{Vk{x)) — W{Vk{x))\/ck > e) = 0. The second term on 
the right in (|32]) is bounded using the Lipschitz constant as ^'{^wpx^x* \^{^k{x)) — W{V{x))\/ck > 
e) < F{sup^^x, Lw\Vk{x) — V{x)\/ck > e), and again taking the limit gives by assumption that 
limP(sup^g;^j \W{Vkix)) - W{V{x))\/ck > e). The result follows by taking the limit of ^ and 
observing that the limit is equal to zero. ■ 

Remark. The theorem shows that convergence in probability is preserved under composition, when 
the convergence in probability is of the form above. The one subtlety in the result and subsequent 
proof is the issue of domains of convergence. We are composing two functions Wk{Vk{x)), and there 
is no guarantee that the range of the function on the inside Vfc(-) is within the domain of convergence 
of the function on the outside Wk{). This is why we require the technical conditions on the domains. 
This ensures that each function is given a suitable domain on which it converges. 

With the previous theorem, we can now show that ipm epi-converges. 
Theorem 10. Suppose the assumptions of Theorem \8\ hold. If Z = X xU and W © B{0, 5) '^V for 
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some 6 > 0, then ipm ^ "j/'o /o^ 6[^/ x[m\ G A'f- 

(/){a;[m]) 

Proof: Note that (fT2l) recursively defines tlie x[m + i], for i = {0, . . . , A^}, as functions of 
only x[m] and c[-]. For example, the equation for x[m + 2] is given in (|34|) . Applying the same 
argument as we used to prove robust constraint satisfaction in Theorem \T\ gives the result that 
x[m + i] E X Q B{0,5). Thus, the set-related conditions of Theorem |9] are satisfied. Consequently, 
by Theorem |9] we have that sup^r^ig;^.^ \\x[m + i]{g) — x[m + i]{g)\\ = 0.p{rm), where r^ is the 
convergence rate from Theorem [8l Since tp is continuous, we can compose it with x[m + i] using 
Theorem m This gives that sup^j^jg;^^ l^Jm - V^o| = Op{rrr,). 

x[m + 2](x[m], c[-]; g) = A^xlm] + AB^Kxlm] + c[m]) + Ag^xlm], Kx[m] + c[m]) + B{K{Ax[m] 
+B{Kx [m] +c[m] ) +g {x [m] , Kx [m] +c[m] ))+c[m-\-l]) +g{Ax [m] +B{Kx [m] +c[m] ) +g {x [m] , Kx [m] 
+ c[m]), K{Ax[m] + B^Kxlm] + c[m]) + g{x[m], Kx[m] + c[m])) + c[m + 1]), (34) 

The last step requires showing that this condition is equivalent to lower semicontinuity in probability. 
Because ^o is continuous, given e > and a point Xo,c[-], there exists a neighborhood f/{xo,c[-]} 
such that 

|V^o(C)-^o(a:o,c[-])|<e/2, (35) 

for all C £ t/{xo, c[-]}. Now consider the expression 

a = P( inf ^m <^o(a;o,c[-]) -e) < P( sup |^„ - ^o(a;o, c[-])| > e). (36) 

Using (|35l) . we can further bound the expression above by a < P(sup^g^{3,y^j.]} \i>m — ^o| > e/2). 
Taking the limit, we have that Hma = 0, and so the result follows by applying Proposition 5.1 of 
lED. ■ 



Remark. Note that the 5 > can be made arbitrarily small, but we cannot have 5 = 0. This is 
an artifact of our proof technique and mode of convergence in probability. The condition could be 
relaxed with 5 = 0, if the convergence is tightened to almost sure. However, showing almost sure 
convergence is harder and so not considered here. 

3) Implementation: Computing the regularized NW estimator (|26l) requires choosing two parame- 
ters. In principle, they can be chosen in a data-driven manner using cross-validation or bootstrapping 
[I52l - ll54l . Unfortunately, this approach is not feasible in practice. Picking these parameters in a 
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data-driven manner is too slow for on-line implementation in many applications. More subtle are 
the numerical issues. As we have mentioned, the estimate g is used as an oracle for a numerical, 
nonlinear optimization algorithm. 

For good numerical behavior, it helps to have differentiability and boundedness of the oracle. If 
A is a small, positive number (e.g., le-5), then the regularity of g is provided by Theorem U\ This 
holds deterministically, regardless of the value of h. 

Choosing h is trickier. As Theorem [8] states, the bias in g decreases at 0(h). This means that h 
should be chosen as small as possible, while preferably ensuring that Sh is a finite sample cover. This 
is difficult to do on-line, and so we suggest a rule-of-thumb h^ = K ■ ri~^/*^P+™\ for some constant 
K > 0. The reason for this choice is because random, independent samples cover X x U C Mp+™ 
atarateof 0(n-i/(p+"^)). 

There is a last point regarding the complexity of computing the regularized NW estimator and its 
derivative. If h is small, then the majority of the terms in the summations will be zero because of 
the finite support of K(-), and so only a small number of terms need to be summed. In fact, the 
lengthiest part of the calculation is determining which terms are and are not zero. Consequently, to 
minimize computation time we recommend in implementations to compute the gradient whenever 
the function value is requested by the numerical optimization algorithm. Our experiences show that 
this leads to speed improvements. 

D. Other Nonparametric Oracles 

Local linear estimators (LLE's) [[TOl , [fT2l and neural networks (NN's) [|5l, (HI are among the other 
nonparametric oracles that can be used to learn unmodeled dynamics. LLE's particularly seem like 
a natural choice for the oracle because they return an estimate of both the function value and its 
gradient Hm or exterior derivative [|10| . However, we have bounds on the value of the unmodeled 
dynamics, and LLE's and NN's are not guaranteed to respect such bounds. This means that MPC 
with these oracles may not be ISS, though trajectories will remain within state and input constraints. 
We can define extensions of these methods which respect bounds, but then these extensions will 
generally be not differentiable at certain points [|55l . Whether these issues are important in practice 
is still an open question that we have not examined. 

V. Example: Moore-Greitzer Compressor Model 

The compression system of a jet engine can exhibit two types of instability, known as rotating 
stall and surge [|56l - ll58l . Rotating stall is a rotating region of reduced air flow, and it degrades 
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the performance of the engine. Surge is an oscillation of air flow which can damage the engine. 
Historically, these instabilities were prevented by operating the engine conservatively. However, in 
the desire for better performance, active control schemes which allow for less conservative operation 
have been studied and implemented [|57ll , [|58l . 

The Moore- Greitzer model is an ODE model which describes the compressor, and it is able to 
accurately predict the instabilities. The model which describes the surge instability is given by 

$ = -^ + ^^+l + 3$/2-$V2 (37) 

^ = ($ + 1 - uV^)/f3\ (38) 

where $ is mass flow, \1' is pressure rise, /3 > is a constant, and u is the throttle opening. The 
throttle opening u is the control variable, and we assume that it is controlled by a second order 

2 

actuator with transfer function u{s) = g2,2(Z"s+w2 ''^^^^' where ( is the damping coefficient and w'^ 
is the resonant frequency. In this model, r is the input which controls the throttle opening u. 

We conducted simulations of the control of this system using either linear MPC or our MPC 
scheme with oracle. For the simulation, we used the parameters /3 = 1, \E'c = 0, C = 1/v^' ^^^ '^n = 
VlOOO. Furthermore, the system was linearized about the equilibrium xq = ( $o \E'o uq mq ) — 
( 0.5000 1.6875 1.1547 o) • We placed constraints on the state and actuator in our simulation. 
The state constraints were < $ < 1 and 1.1875 < ^ < 2.1875. The actuator constraints were 
0.1547 <u< 2.1547 and -20 < m < 20. The input constraints were 0.1547 < r < 2.1547. 

Both the linearization and exact discretization, with sampling time T = 0.01, about the point 
Xq are unstable. Denote the discretization as x[n + 1] = Ax[n] + BSr[n], where the state is x = 
[6^ 6"^ 6u Sii) and the control is r[n] = 6r[n] + tq with tq = Uq. We picked a feedback 
matrix K = [-3.0741 2.0957 0.1195 -0.009o) which stabilizes the system. This particular K 
was chosen so that the poles of the closed-loop system x[n + 1] = (A + BK)x[n] were placed 
at {0.75,0.78,0.98,0.99}, and these poles were chosen because they are close to the poles of the 
open-loop system, while still being stable. 

We compared the performance of a linear MPC and nonlinear MPC with the regularized NW oracle 
for regulating the system about the operating point ( $q \I/q uq uq) , by conducting a simulation 
starting from initial condition ($o — 0.35 "^q — OAO uq uq) ■ The horizon was chosen to be 
N = 100, and both MPC schemes use the cost function 'ip = ||x[m + i] — a^ollp + X]j=o \\^[i^ + 
i] — xq\\q+ \\f[m + i] — rolllj, with Q = 1^, R= I, and P which solves the discrete-time Lyapunov 
equation (fTTI) . 
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Fig. 1. The states and control of the nonlinear MPC with oracle (solid blue) and linear MFC (dashed red) are shown. The nonlinear 
MFC with oracle provides faster regulation to the operating point (dotted black). 



The oracle used the bandwidth h = 0.01, and the data used by the oracle was taken from 
measurements of the system under linear MPC. Such pre-training of the oracle leads to improved 
performance. It is also worth noting that the oracle only used three states Xj = i^[i] '^[i] u[i]) 
to estimate g{x, u); incorporation of this kind of prior knowledge leads to improved performance by 
reducing the dimensionality of the estimation. 

The result of this simulation is shown in Fig. [H The nonlinear MPC with oracle provides faster 
regulation to the operating point. This does come at the cost of increased computational time. The 
linear MPC required 0.25s on average to compute the control at each point in time. On the other hand, 
the nonlinear MPC with oracle took an average of 3.0 s in computing the control at each point in time. 
The nonlinear optimization is the rate-limiting part of the computation and not the regularized NW 
estimator. Both implementations are MATLAB code which uses the SNOPT solver [|59l to compute 
the MPC, and it may be possible to optimize this code by using another programming language. The 
polytope computations were aided by the Multi-Parametric Toolbox (MPT) [[60l . 



VI. Conclusion 

We have described a learning-based MPC scheme with deterministic guarantees on robustness 
and safety. Our scheme uses a linear model with bounds on its uncertainty to construct invariant 
sets which help to provide the guarantees. It uses a general oracle which returns the value and 
gradient of unmodeled dynamics at discrete points. We constructed a new estimator which we call 
the regularized NW estimator and proved it is deterministically differentiable. This is important 
for implementations in digital hardware because numerical optimization algorithms have difficulty 
dealing with non-differentiable functions. Using the regularized NW estimator as the oracle for the 
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learning-based MPC scheme provides a method which we prove converges to an MPC which knows 
the unmodeled dynamics. 

Future work in this method will likely proceed in three directions. The first is the extension of 
this method to other classes of models in which the appropriate invariant sets can be computed. The 
second is the study of different oracles. The third is research on speeding up the computations. Our 
scheme can be applied to psuedo-spectral methods, and this may provide faster computation times. 
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Appendix 

The first part of this appendix motivates why we measure all states. Suppose the unmodeled 
dynamics are of the form 7(0;, u; A) + K, where i^ is a constant, non-zero vector and 7(0;, u; A) is 
a parametrized nonlinear function such that 7(0;, m; Aq) = for some parameter value Aq. We show 
that a filter cannot learn these unmodeled dynamics while doing state-estimation, unless all states are 
measured. This negative result is not true for all systems, and it is possible to do both for systems 
with specific unmodeled dynamics. 
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The second part of the appendix discusses state filtering when all states are measured. State filtering 
is needed to deal with the case in which the measurements have noise because such noise leads to an 
errors-in-variables situation. This situation is problematic because it is known that standard estimates 
of unmodeled dynamics will be asymptotically incorrect [|32ll . Control engineers have developed 
techniques, including the Kalman filter, to deal with this ||33l , ||34l . 

The Kalman filter requires a parametric representation of the unmodeled dynamics, and this is not 
always available. Instead, we use a local polynomial regression (LPR) filter which requires less prior 
knowledge. The piecewise constant input in MPC is incompatible with existing regularity conditions 
for LPR filters on the smoothness of states and inputs. To remedy this, we define a bi-level sampling 
scheme that provides enough smoothness. 



A. Limitation of Filters 

We begin with a negative result concerning filters and their ability to estimate both the state and 
unmodeled dynamics. This result concerns a general class of unmodeled dynamics, among which 
series expansions are included. Suppose gc{x,u) = 7(x,m;A) + K, where i^ is a constant, non- 
zero vector and 7(0;, m; A) is a parametrized nonlinear function such that 7(x, u; Aq) = for some 
parameter value Aq. This can be used to define an augmented system with y = Cx and 
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The ability to simultaneously do state estimation and parameter identification is equivalent to the 
observability of (x, K, A). We have the following theorem. 

Theorem 11. A necessary condition for the observability (and detectability) of the system given in 
09\l with y = Cx is that rank{C) = n. 

Proof: Suppose A = Aq, which makes 7(0;, u; A) = 0. Then the system is linear and time- 
invariant (LTI). Using the Popov-Belevitch-Hautus (PBH) test, the system is observable if and only 

if rank(0) = n + n = 2n, for all s G C : Re(s) > 0, where 

fsI-Ar -A 







si 



(40) 
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If s = 0, then the matrices (p and si are both singular, and the block structure of (p implies that 
rank(7) = n + rank(C). The system is not observable (and not detectable) when rank(C) < n, 
establishing necessity. ■ 

Remark. In light of this negative result concerning filters, we require that C be full rank. Without 
loss of generality, we assume that the full state x is measured. 

B. Nonparametric Filtering 

We propose a non-parametric regression approach for estimating the state because the design of 
a Kalman filter for systems with unmodeled dynamics is not straightforward. Available approaches 
include local polynomial regression (LPR) or spline-smoothing, and the Savitzky-Golay filter ||6TI 
is technically a finite impulse response (FIR) filter implementation of LPR. The advantage of our 
non-parametric approach is that the filter will be weighted sums of measurements and so easily 
implemented. 

An important point to note is that the statistical guarantee provided by our method is not the 
same as the Kalman filter. The Kalman filter is defined to be consistent if its state estimates are 
unbiased and the true error covariance is smaller (covariance matrices are positive semi-definite, and 
so a partial order can be defined) than the estimated error covariance. In our method, the consistency 
is with respect to the sampling period Tg of state measurements. As T^ — ;■ 0, the estimates go to 
the real values in probability. This philosophical change is necessary in order to use non-parametric 
statistics, otherwise we would be forced to use a parametric model of the unmodeled dynamics. 

Lemma 5. Suppose gc{x,u) is k — 1-times differentiable. For n G Z, the trajectory x(t) which 
solves the ODE in ([7]) is once-differentiable everywhere, k-times differentiable at t ^ nT^, and not 
twice-differentiable at t = riTu. 

Proof: The first time-derivative of x{t) is given by ([T]), by definition. As is common in MFC, the 
inputs are piecewise constant ^. Thus, u{t) is not differentiable at t = nTu. Because the first time- 
derivative of x{t) is a function of u{t), this means that x{t) is not twice-differentiable at t = nTu. 
Recall that u{t) is constant for t ^ nTu. Thus, u{t) is smooth at t ^ nTu. This implies that x{t) is 
fc-times differentiable at t 7^ nTu, because gc{x,u) is k — 1-times differentiable. ■ 

Remark. These qualitative features mean that we cannot use LFR methods with order higher than 
zero (e.g., local linear regression) without modifying the filtering scheme. This is an important point 
because the continuity of the trajectory x{t) makes it tempting to use LFR. Yet, no theoretical 
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Fig. 2. We use a sampling scheme with two sampling periods. The inputs change at every T^ units of time, and the states are 
measured every Ts units of time. In this example, kTs = Tu with fc = 4. 



convergence guarantees can currently be made in such a situation, and the behavior of these filters 
may be unpredictable. 

In light of these restrictions, we propose a modified sampling scheme. Recall that T„ is the sampling 
time for control inputs, and we define Tg to be the sampling time for state measurements. We require 
that kTg = Tu for some k E Z+, and this scheme is illustrated in Fig. [2] for the case of A; = 4. 
The advantage of this sampling scheme is that the trajectory x{t) is piecewise smooth (infinitely 
differentiable) in between the samples taken at nTu, because the control input u{t) is piecewise 
constant. This allows us to use LPR of order higher than zero (e.g., local linear regression), which 
can give significant improvements in estimation error over 0-th order LPR. 

If the trajectory of the real system is x(t), then consider a measurement model ^{{mTs + nTu) = 
Xi{mTs + nTu) + ei,m E Tj^ : niTg E [0,T„], where Cj are independent and identically distributed 
(i.i.d.) random variables with zero mean and bounded values Ij < [ei]j < Uj. The notation [ei]j 
indicates the j-th component of the i-th noise vector. Suppose that we have made measurements for 
n = 0, . . . , N. This measurement model corresponds to the sampling scheme seen in Fig. [2l 

1) Filter Design: Suppose K{u) is a kernel function, which is a bounded even function with 
finite support. Let /iA,i,n, hp^i^n € M be bandwidth parameters, where A, p denote left and right. Now 
define a diagonal matrix R[n,i] with entries given by R[n,i]m+i = K{niTs/h\in) and a diagonal 
matrix L[n,i] with entries given by L[n,i]m+i = K{{mTs — Tu)/hp^i^n)- Lastly, define the matrix 

r=(n r^„ : r,^J', where r, = (i t ... r). 

We are now ready to design the filter. The filter coefficients are given by 

w^[n,z] = e[ [{T'L[n,z]T)-'T'L[n,z]]^^^ (41) 

Vm[n,i] = e[ [{T'R[n,z]T)-'T'R[n,z]]^^^, (42) 

and ei is the unit-vector with a 1 in the first position and zeros everywhere else. The idea is that 
Wmf^T^] filters on the left side of t = nTu and Vm[n\ filters on the right side of t = nTu. As time 
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advances to t = NTu, we first filter on the left side of ^{NTy) (because there is no right side). At 
the next point in time t = {N + 1)T„ we filter on both sides of ^(NTu). Consequently, the filter is 
time- varying. 

Let superscripts denote the (discrete) time at which the filter is computed. The raw state estimates 
(for times t = nTu, for n = 0, . . . , N) computed at time t = NT^ are given by 

^^[N] = ELo ^n.[N - 1, z]UmTs + {N- 1)TJ (43) 

x^[N - 1] = 1/2 ■ j:l=o^m[N - l,t]UmT, + nT.,) + 1/2 ■ x^-\{N - l)T^) (44) 

xf [n] = xf -^ [n] , Vn < A^ - 1 . (45) 

The state estimates are given by 

xf [n] = min{^i(nT„) - k, max{^i(nr„) - Qi, x^[n]}},'in. (46) 

The operation in (l46l) maintains the bounds on the noise, and it makes sure that the filter saturates 
if it tries to exceed the bounds of the noise. This filtering is well-defined because of the piecewise 
continuity of the control input u{t), and it is consistent in a point- wise sense, as the following theorem 
shows. 

Theorem 12 (Ruppert and Wand, 1994). Suppose that Tu is fixed, and r is the order of the filter If 
fc — )■ oo, then the filter defined in {\43M6\) is consistent Xi[n] — xf^[n] = Opi /c~(''+i)/(2»'+3) j^ where 



- s 



Proof: Strictly speaking, the result in [12J applies to the filter defined in (I43l)- (|42)) . Consistency 



with respect to (1461) is established by noting that the bounds on the noise imply that 



Xj \n\ — x^ \n\ 



Xj \n\ — x^ \n\ 



< 



Remark. Because k = T^/Tg, this theorem intuitively says that the filter performs well as long as Tg 
is much smaller than Tu- 

We also have the following lemma which discusses the finite-sample properties of (|46l) . 
Lemma 6. Under the assumptions delineated above, we have that x[n] G x[n] © £, where £ = {t : 

Proof: Note that (|46l ) enforces that E,i — Qi < Xi < ^i — k, which can be rewritten as x G 
^ © (~{e • ^j ^ [e]j ^ Wj})- The bounds on the noise Xi + k < ^i < Xi + gi are equivalent to having 
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^ e X ® {e : Ij < [e]j < Uj}. The result follows from properties of ©, e lUHl, [|29l|. ■ 

2) Filter Implementation: Because the filter is simply a weighted sum of measurements (I43H45I) . 
the largest difficulty with implementation is in computing the filter coefficients (14 11421) . The first 
step in doing this is to choose the order of the filter. Empirical results show that linear (r = 1) or 
quadratic (r = 2) LPR typically gives good results. For clarity of presentation, we focus here on the 
case of r = 1. 

Having chosen the order of the filter, the next step is to compute the bandwidth parameters 
^A,i,n, hp^i^n- To make the notation compact, let ■ be a blank spot that is either replaced with ■ = p 
or ■ = A. Using results from [|62ll . it can be shown that the optimal bandwidths for r = 1 are 

approximately given by h.^i^n = ( 2^iJr)k ) ' where a = j^ K'^{u) du / (Jj^ u'^K{u) du) and the 
second time-derivative Xi{nT^) is the left-sided derivative if • = A (or right-sided derivative if ■ = p). 
Unsimplified expressions for the cases r > 1 can be found in [[62l . We can approximate the values 
of these second time-derivatives by using ([T]). More specifically, the estimated values are given by 
UnT+) = [Al^inTu) + A,B,u[n]]^ and U^T') = [Al^{nTu) + A,B,u[n - 1]]^. 

Because it is time consuming to compute the filter coefficients (|41H42I) . we suggest an implemen- 
tation in which they are precomputed. Define a set "H = {hi, . . . ,hmax}, and compute the filter 
coefficients for each value in "H. Then, when we would like to filter, we estimate the time derivatives 
Xi{nT^) and Xi{nT^), and use these to compute /i.i,„. The closest value in Ti is selected, and the 
corresponding set of precomputed filter coefficients are used to do the filtering as defined in (I43II45I) . 
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